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(57) A structured document generating method 
and apparatus capable of easily generating a structured 
document matching the document structure of each non- 
structured document, by using a rule directly generated 
from a preset document structure definition for the 
conversion of the non-structured document into the 
structured document. A keyword extracting module (102) 
extracts a keyword representative of the document 
structure from a non-structured document (101) by using 
a keyword extracting rule (103), and a keyword/text 
model (104) is generated which is described by two 
elements including keywords and other strings. A 
parsing module (105) generated by a process (113) of 
automatically parsing the document structure by 
referring to a parsing rule (110) generated by 
modifying and converting DTD (106), performs a parsing 
process relative to the keyword/text model (104) to 
generate an interim SGML document (114). An SGML 
document correcting module (115) modifies the interim 
SGML document (113) and generates a final output of an 
SGML document by referring to DTD different information 
(109) generated when the parsing rule was generated. 
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(54) Method and apparatus for generating structured document 

(57) A structured document generating method and 
apparatus capable of easily generating a structured 
document matching the document structure of each 
non-structured document, by using a rule directly gener- 101 
ated from a preset document structure definition for the 
conversion of the non-structured document into the 
structured document A keyword extracting module 
(102) extracts a keyword representative of the docu- 
ment structure from a non-structured document (101) 
by using a keyword extracting rule (103), and a key- 
word/text model (104) is generated which is described 
by two elements including keywords and other strings. A 
parsing module (105) generated by a process (113) of 
automatically parsing the document structure by refer- 
ring to a parsing rule (110) generated by modifying and 
converting DTD (106), performs a parsing process rela- 
tive to the keyword/text model (104) to generate an 
interim SGML document (114). An SGML document 
correcting module (115) modifies the interim SGML 
document (113) and generates a final output of an 
SGML document by referring to DTD different informa- 
tion (109) generated when the parsing rule was gener- 
ated. 
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Description 

BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention generally relates to manage- 
ment of documents having a regular document format 
such as legal documents, and particularly to a method 
and apparatus for generating a structured document 
from a non-structured document The "non-structured 
document" means a document which does not contain 
information explicitly showing the structure of a docu- 
ment entered through character recognition, a word 
processor, or the like. The "structured document" is a 
document which contains information explicitly showing 
the structure of the document. 

Description of the Related Art 

In a known method of generating a structured doc- 
ument information explicitly showing the document 
structure is embedded in a text Generally, a document 
generated by a user (hereinafter called a "document 
instance") often contains a portion for designating a file 
which describes a document structure definition and a 
text content portion. The document structure definition 
defines the document structure and a mark indicating 
an element (the mark is hereinafter called a "tag"). The 
document structure definition is often set in order to effi- 
ciently use a document to be structured. The tag 
defined by the document structure definition is inserted 
into the text content portion in order to explicitly express 
the document structure and uniquely determine a string 
which is an element of the document structure indicated 
by the tag. 

In outputting a document instance structured in the 
above manner, an image to be output is generated by 
refening to a file which describes a layout definition 
defining what format is used for outputting each compo- 
nent (hereinafter called an "element") of the document 
structure. In this method, the document instance and 
the layout definition are independent so that any docu- 
ment instance can be used irrespective of the type of an 
apparatus or system to be used for the output. 

The contents of a string of a structured document 
are explicitly expressed by inserting a tag such as < 
author name) and (title) which is in one-to-one corre- 
spondence with an element Therefore, in combination 
with a tool such as a full text search system for struc- 
tured documents, an aggregation of document 
instances themselves can be used as a database, and 
the document contents can be added or changed easily. 
Even if part of this database is lost by some failure, it is 
possible to know that this database has a lost portion, 
by comparing the original document structure defini- 
tions with the database of document instances. 

Because of these advantages, structured docu- 
ments are widely used for document management of a 



document processing system which stores and uses a 
large number of documents. Along with this, several 
approaches have been proposed to convert a non- 
structured document such as already present paper 
5 documents and documents entered by a word proces- 
sor, into a structured document 

JP-A-62-249270 and "Method of Converting Docu- 
ment Image into ODA Structured Document" (Journal of 
Papers of The Institute of Electronics, Information and 
10 Communication Engineers, D-11 Vol. J76-D11 No. 11 
pp. 2274-2284) propose the following method. First, the 
field of a document type of a document is restricted. 
Next, a structured document is generated by using a 
document structure common in the restricted field 
15 (hereinafter called a "common document structure") 
and a document structure analysis rule. 

With this method, the document structure usable in 
common in each field of a document such as "technical 
document" and "business document" is set. Then, the 
20 document structure analysis rule is manually generated 
in order to analyze a non-structured document and 
extract a document structured of it. By using the docu- 
ment structure analysis rule, the non-structured docu- 
ment is converted into a document instance matching 
25 the common document structure. If there is an element, 
which is specific to each document structure and unable 
to be expressed by the common document structure 
(hereinafter called an "individual document structure"), 
the document instance matching the common docu- 
30 merit structure is converted into a document instance 
matching the individual document structure. 

With this method, however, the document structure 
subjected to the document structure analysis and the 
document structure analysis rule are dependent upon 
35 the field of a non-structured document. Therefore, in 
order to process a document in a different, field, the doc- 
ument structure analysis rule for this f iekf is required to 
be newly generated manually. This-work requires a 
large amount of labor. 
40 This method uses a single document structure 
analysis rule considered to have high commonness in a 
plurality type of documents in a specific field. Therefore, 
this single document structure analysis rule is not 
always optimum to each document and an element spe- 
45 crfic to an individual document structure cannot be ana- 
lyzed directly. In this case, it becomes necessary after 
the document structure analysis to convert again the 
document instance into another document instance 
matching the individual document structure. Spetifi- 
so cally, tags of the first generated document instance are 
added, changed, or deleted. This work generally 
requires complicated operations and hence a large 
amount of labor. 

Further, this method does not consider a support to 
55 generate a rule for extracting a keyword. Therefore, an 
element as a keyword is required to be manually deter- 
mined and the conditions of layout and string necessary 
for extracting a keyword is also required to be manually 
set. 
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Still further, this method does not provide means for 
supporting- to- determine an element as a keyword 
(hereinafter called a "keyword-corresponcfi ng element"). 
Elements which contain string data are not always 
extracted as keywords. Elements having no characteris- 5 
tic layout or string are not extracted as keywords, but 
they are dealt as a string between keywords, i.e., a non- 
keyword. 

The restriction condition that "non-keywords should 
not be contiguous in a document instance" is imposed 10 
when which element is determined to be a keyword-cor- 
responding element. This is because the non-keyword 
is a "string between keywords" and the non-keyword is 
required to be always contiguous to a keyword. How- 
ever, conventional methods have no means for automat- 15 
ically checking whether an aggregation of elements 
determined as keyword-corresponding elements satis- 
fies the restriction condition. If the aggregation of these 
keyword-corresponding elements does not satisfy the 
restriction concfition, some defective or erroneous con- 20 
ditions occur when the rule for document structure anal- 
ysis is generated or when the document structure is 
analyzed. It is therefore necessary to determine again 
keyword-corresponding elements. This cycle is required 
to be repeated until an aggregation of proper keyword- 25 
corresponding elements is set 

Lastly, this method does not support to set the con- 
ditions of layout and string necessary for the extraction 
of a keyword. It is therefore necessary to manually col- 
lect information necessary for the extraction of a key- 30 
word from a non-structured document itself a rules or 
the like defining the format of the non-structured docu- 
ment. This requires a large amount of labor. 

JP-A-6-290173 gives the following description. A 
document structure indicating each element of a 35 
labeled document is generated by referring to a 
"schema" describing restricting information of the docu- 
ment structure, and then a structured document is gen* 
erated. 

In JP-A-6-290173, however, although use of the 40 
schema describing restricting information of the docu- 
ment structure is described, how the schema is gener- 
ated is not described. 



SUMMARY OF THE INVENTION 



45 



It is an object of the invention to solve the above 
problems and enable proper document structure analy- 
sis of documents of a plurality of fields. 

It is another object of the invention to directly ana- so 
lyze elements specific to the individual document struc- 
ture and enable to directly generate a document 
instance matching the individual document structure. 

It is a further object of the invention to support to 
generate a rule for extracting a keyword. 55 

In order to achieve the above objects, the invention 
provides a method of generating a structured document 
for a structured document generating apparatus having 
at least an input/output device, a control unit, and a 



repository wherein a non-structured document not 
explicitly given the document structure and input from 
the input/output device is converted into a structured 
document explicitly given the document structure, in 
accordance with a document structure definition defin- 
ing the document structure, the method comprising the 
steps of: modifying a given first document structure def- 
inition so as to match the document structure of the 
input non-structured document and generate a second 
document structure definition; the control unit generat- 
ing a parsing rule used for performing a parsing process 
suitable for the document structure of the second docu- 
ment structure definition, by modifying marks constitut- 
ing the second document structure definition and 
modifying the second document structure definition so 
as to make the positional order of the marks in one-to- 
one correspondence; in accordance with the generated 
parsing rule, generating a first structured document 
from the input non-structured document; and in accord- 
ance with difference data between the first document 
structure definition and the second document structure 
definition, converting the generated first structured doc- 
ument into a format matching the first document struc- 
ture definition to thereby generate a second structured 
document. 

With the above configuration, conversion from the 
non-structured document to the structured document 
can be performed, for example, by a parsing module 
which analyzes the document structure through parsing 
on the basis of extracted keywords. The parsing module 
is generated by converting a given document structure 
definition into a parsing rule by means of a parsing rule 
generating module, and by subjecting this parsing rule 
to a process of automatically generating a parsing mod- 
ule. 

In the process of automatically g^erating a parsing 
module, an aggregation of rules such as "A is consti- 
tuted by patterns B, C,..." is input anda program for exe- 
cuting a parsing process in accordance with these rules 
is output. A particular process to be executed when 
each rule is satisfied can be described in this program. 
Such a process of automatically generating a parsing 
module may be yacc, for example. 

With the above configuration, if the same string in 
the same string region is extracted as a plurality of dif- 
ferent keywords, the parsing module of the control unit 
selects a proper one from the plurality of keywords in 
accordance with whether the parsing process succeeds 
or fails. 

A method of generating a structured document is 
performed in practice as in the following. First, a key- 
word extraction module extrac t s a keyword from the 
non-structured document and generates a key- 
word/text model of an abstract which represents the 
non-structured document as an aggregation of ele- 
ments constituted by keywords and other strings. 

The parsing module performs a parsing process 
relative to the keyword/text model to generate the struc- 
tured-document. The parsing module is generated by 
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the parsing module in the following procedure. First, a 
given document structure def inition is modified «rasrto 
match the document structure of the non-structured 
document, and difference therebetween is stored. Next, 
the parsing rule generating module converts the modi- 
fied document structure definition into a parsing rule. In 
this case, when each rule is satisfied, i.e.. when each 
element is detected, a program for recording .forma- 
tion of the detected element in a corresponding position 
of the keyword/text model is embedded in the parsing 
rule. Then, the process of automatically generating a 
parsing module generates the parsing module which 
realizes the parsing process described in the parsing 
rule. 

The parsing module generated in the above man- 
ner performs a parsing process relative to the key- 
word/text model generated by the keyword extracting 
module, and generates an interim structured document 
matching the modified document structure definition, in 
accordance with the parsing results recorded in thekey- 
word/text model. A structured document correcting 
module refers to the difference stored when the docu- 
ment structure definition was modified, and output a 
structured document matching the document structure 
definition before modification. 

A given layout definition and a second document 
structure definition support the generation of a keyword 
extraction rule used for extracting a keyword. The sec- 
ond document structure definition is generated by mod- 
ifying a preset document structure definition so as to 
match the document structure of the input non-struc- 
tured document. 

Specifically, the keyword extracting module com- 
prises: means for extracting layout information from the 
given layout definition, the layout information including 
information about layout and string used when each ele- 
ment of the document structure is output; means for 
extracting information of connection between elements 
from the second document structure definition; means 
for supporting a determination by a user of which ele- 
ment is extracted as the keyword, by using the informa- 
tion of connection between elements; and means for a 
user to edit layout information extracted from the layout 
definition so as to match the layout of the non-structured 

document. „ 
The means for editing layout information com- 
prises: means for notifying the layout information 
extracted for each element of the document structure to 
the user, the layout information being provided for each 
item necessary for extracting a keyword; and means tor 
the user to modify the notified layout information so as 
to match the layout of the non-structured document or to 
supplement missing information. 

With the above structure, the document structure 
and the rule for analyzing the document structure are 
generated by modifying the document structure defini- 
tion preset for each document Therefore, labor required 
for the design of the document structure for document 
structure aralysis and required for generating the rule 



can be reduced. Since the parsing rule dynamically 
generated in accordance with the document structure 
definition of each document rs used, it is posstole ito 
directly generate the structured document matching the 
* individual document structure without using the com- 
mon document structure, and it is not necessary to con- 
vert the structured document from the format matching 
the common document structure into the format match- 
ing the individual document structure. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 



Rg 1 is a block diagram illustrating the operation 
outline of a structured document generating system 
js according to an embodiment of the invention. 

Fig. 2 is a diagram showing an example of a non- 
structured document, . 

Fig 3 is a diagram showing part of DTD which is a 
document type definition of an SGML format set for the 
so document shown in Fig. 2. 

Fig. 4 is a tree diagram showing part of DTD shown 

' n Fi Fv.S te showing an example of a keyword extrac- 
tion rule in part. . 
25 Fig 6 is a diagram explaining a description constit- 
uent of the format condition of the keyword extraction 
rule shown in Fig. 5. 

Fig 7 shows an example of extracted keywords. 
Fig 8srowsanexamrjleofakeywordAextrnodel. 
ao Fig. 9 is a block diagram illustrating the operation 
outline of a parsing rule generating modula 

Rg. 10 shows an example of a modified DTD in 

"** Fig 1 1 shows an example of DTD difference data. 
35 Rg 12 shows conversion rules to be referred to 
when the parsing rule generating module converts DTD 
into a yacc rule. . - . 

Rg . 13 shows an example of an interim yacc rule in 

40 ^ Fig. 14 shows an example of a parsing jr. part 
Rg. 15 shows an example of an interim SGML doc- 
ument in part . 
Rg. 16 illustrates an example of a process by an 

SGML document correcting module. 

« Rg. 17 shows an example of an SGML document 
finally generated by the embodiment method. 

Fig 18 is a block diagram showing the hardware 
structure of the structured document generation system 
of the first embodiment 
so Fig. 19 is a diagram illustrating the process outline 

to be executed by the parsing module. 

Rg. 20 shews an example of a keywordAexl model 
with tag information being given. 

Rg 21 is a block diagram illustrating the process 
55 outline to be executed by a keyword extraction rule gen- 
erating system according to a second embodiment of 

the invention. 

Rg. 22 shows an example of extraction of string- 
corresponding elem ents. 
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Rg. 23 shows an example of the modified DTD 
shown in Fig. 10 described in BNF notation. 

Fig. 24 is a diagram illustrating the procedure of 
obtaining string-corresponcfing elements capable of 
appearing at the start of each element 

Fig. 25 shows string-corresponding elements capa- 
ble of appearing at the start and end of each element in 
the modified DTD described in BNF notation shown in 
Fig. 23. 

Fig. 26 is a diagram showing the contiguity relation- 
ship between string-conesponding elements in the 
modified DTD described in BNF notation shown in Fig. 
23. 

Fig. 27 shows an example of string-corresponding 
element information. 

Fig. 28 shows an example of layout information. 

Fig. 29 shows an example of required items neces- 
sary for extracting a keyword. 

Fig. 30 shows an example of the process of extract- 
ing a required item from the layout definition. 

Fig. 31 is a diagram showing an example of an 
interface of a keyword information indicating module. 

Fig. 32 is a flow chart illustrating the processes to 
be executed by the keyword information incficating mod- 
ule. 

Fig. 33 is a diagram showing an interface of a sup- 
plementary information editing module. 

Fig. 34 is a flow chart illustrating the processes to 
be executed by the supplementary information editing 
module. 

Fig. 35 is a f tow chart illustrating the process of 
generating a format condition. 

Fig. 36 is a flow chart illustrating the processes to 
be executed by a contiguous element checking module. 

Fig. 37 is a diagram showing an example of the 
results processed by the contiguous element checking 
module. 

Fig. 38 is a block diagram showing the hardware 
structure of the keyword extraction rule generating sys- 
tem of the second embodiment 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS 

Embodiments of the invention will be descrfced with 
reference to the accompanying drawings. In this 
embodiment, a structured document generating module 
analyzes a document structure through parsing. As the 
structured document format, an SGML (Standard Gen- 
eralized Markup Language) format is adopted, and as 
the document structure definition, DTD (Document Type 
Definition) of an SGML document type definition is 
used. The process contents and description rules of 
SGML and DTD are stipulated in ISO (International 
Organization for Standardization) standards IS08879. 
The details thereof are explained in "SGML: An Author's 
Guide to the Standard Generalized Markup Language", 
by Martin Bryan, Addison-Wesley, Publishers, 1988. In 
this embodiment, yacc is used in a process of automat- 



ically generating a parsing module. C language is used 
for describing a process to be added when each rule to 
be inputted to yacc is satisfied. The details of a yacc 
process are explained in a document "How to Use yacc 

s and lex" by Takashi SAITHO, HBJ publishing division, 
and the C language is explained in a document "Pro- 
gramming Language C" by B. W. Kernighan and D. M. 
Ritchy, Kyoritsu Publishing Company. 

First, the outline of the first embodiment will be 

10 described. Rg. 19 is a diagram showing the hardware 
structure of a structured document generating system 
of the first embodiment An input/display device 1 
receives an input entered by a user and displays an 
input non-structured document a generated structured 

is document or the like. The input/display device 1 is con- 
stituted by a dsplay, a keyboard, a mouse, or the like. 
An external repository unit 2 stores a variety of data for 
structured document generation. This unit 2 is realized 
by a hard disk or the like and constituted by a non-struc- 

20 tured document repository 21, a structured document 
generating rule repository 22, and a structured docu- 
ment repository 23. A control unit 3 controls each device 
constituting the system, processes information for struc- 
tured document generation, and is constituted by a corv 

25 trailer 31, an internal memory 32, and a structured 
document generating unit 33. The controller 31 reads 
data stored in the non-structured document repository 
21 and structured document generating rule repository 
22, develops it on the internal memory 32, executes 

30 processes of the structured document generating unit 
33 on the internal memory 32 by using the developed 
data, and stores the generated structured document in 
the structured document repository 23. The processes 
to be executed include a process 34 of generating a 

35 parsing module and a process 35 of generating a struc- 
tured document. The parsing module generating proc- 
ess 34 constitutes part of the structured document 
generating process 35. The structured document gener- 
ating process 35 is a process of converting a non-struc- 

40 tured document stored in the non-structured document 
repository 21 into a structured document by using a 
document structure definition, a keyword extraction rule, 
a rule conversion regulation, and the like respectively 
stored in the non-structured document repository 21. 

45 The parsing module generating process 34 and the 
structured document generating process 35 can be 
described by known programming languages. 

Next, the outline of processes of the first embodi- 
ment will be described. 

so Rg. 1 is a block diagram showing a flow of the 
structured document generating process of the struc- 
tured document generating system of the embodiment. 
A non-structured document 101 is electronic document 
information of sequential character strings generated by 

55 a word processor, a character recognition apparatus, or 
the like, and is input to the system from the input/display 
device 1. A keyword extraction module 102 extracts a 
keyword from the non-structured document in .accord- 
ance with a keyword extraction rule 103. A keyword is a 
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character string expressing a document structure of the 

non-structured document 101. The keyword extraction 

module 102 then separates the non-structured docu- 
ment 101 into keywords and other strings and gener- 
ates an abstract keyword/text mode! 104 as an 5 
aggregation of these elements of keywords and other 
strings. A parsing module 105 performs a parsing proc- 
ess described in a parsing rule 1 1 1 to analyze the doc- 
ument structure, the parsing rule 111 having been 
generated by a parsing rule generating module 110. w 

The outline of a method of generating the parsing 
module 105 is as follows. First, a DTD correcting mod- 
ule 107 modifies a DTD 106 to generate a modified DTD 
so as to match the description format of the non-struc- 
tured document 101, and stores difference information is 
as DTD difference data 109. DTD 106 is a prepared 
standard document type definition and does not neces- 
sarily match the input non-structured document 101. 
This modification is therefore performed in accordance 
with a comparison result by a system user between the 20 
non-structured document 101 and DTD 106. The pars- 
ing rule generating module 110 refers to a rule conver- 
sion regulation 1 1 2 and generates the parsing rule 1 1 1 
from the modified DTD 108. Then, yacc 113, which is 
the process of generating a parsing module of thus 25 
embodiment, generates the parsing module 105 in 
accordance with the parsing rule 1 1 1, the parsing mod- 
ule 105 realizing a parsing process described by the 
parsing rule 111. 

The parsing module 105 performs a parsing proc- 30 
ess for the keyword/text model 104, and affixes a tag 
representative of the document structure to generate an 
interim SGML document 114. This document is a docu- 
ment instance formed in conformity with the modified 
DTD 108. Therefore, by referring to the DTD difference 35 
data 109, an SGML document correcting module 115 
modifies the interim SGML document 114 to generate 
an SGML document 116 matching DTD 106. 

Each process of the embodiment will be detailed 
next 40 

Fig. 2 shows an example of the non-structured doc- 
ument 101 shown in Rg. 1. This document is obtained 
from an already present paper document regarding a 
law through character recognition. Although there is no 
explicit description showing the document structure, this 45 
document has a layout of each component easy to read, 
using spaces or the like. In order for the document 
processing system to utilize such a text type electronic 
document, a document type definition (DTD) is set. Rg. 
3 shows an example of DTD for the non-structured doc- so 
ument shown in Rg. 2. The opening first fine (line 
number 1 , other lines are also represented by line num- 
bers) indicates that the document structure definition 
has a name of "LAW". Second to seventeenth lines indi- 
cate definitions of elements. The name of an element is ss 
described after "ELEMENT, and after this a model 
group is described between "(" and ")". The model 
group is an aggregation of constituents which form ele- 
ments. These constituents are one or more elements 



and content tokens representative of data such as 
"#PCDATA", or model groups themselves disposed in a 
nest may be used as such constituents. The second line 
indicates that the element "LAW" is constituted by a 
series of elements of "PROMULGATION", "ESTAB- 
LISHEDREGULATIONNO" TITLE", and "PRESEN- 
TREGULATION". The third line incficates that the 
element "PROMULGATION" is constituted by a series 
of elements of "PROMULGATIONSTATEMENT, 
"PROMULGATIONDATE", AND "PROMULGATrONOF- 
FICER". The eleventh line incficates that the element 
"PRESENTREGULATION" is constituted by one or 
more "ARTICLES". The element affixed with "+" such as 
the "ARTICLE" means that more than one element may 
be used. The element affixed with an asterisk "*" means 
that the number of elements is optional. The element 
"#PCDATA" at the fourth, fifth, and seventh to tenth lines 
means that the corresponding elements "PROMULGA- 
TIONSTATEMENT, "PROMULGATIONDATE", "OFFI- 
CIALTITLE", "NAME", "ESTABLISHEDREGULATION 
NO.", AND TITLE" each have the string indicating the 
contents of the element The document structure in a 
tree diagram is shown in Rg. 4. 

In this system, the document structure of a non- 
structured document such as shown in Rg. 2 is ana- 
lyzed by directly using DTD such as shown in Rg. 3 to 
generate a structured document which matches DTD. 

The keyword extraction module 102 shown in Rg. 1 
refers to the keyword extraction rule 103 to extract a 
keyword from the non-structured document 101 and 
generate the keyword/text model 104. An example of 
the keyword extraction rule 103 is shown in Fig. 5. This 
rule is an aggregation of combinations of the name of an 
element to be extracted as the keyword and a layout 
condition which describes information about layout and 
string used for the extraction. In Rg. 5, the first item at 
each fine is the name of a keyword, and the second and 
following items are the layout concfitionsTRg. 6 gives an 
explanation of a description constituent of the layout 
condition shown in Rg. 5. For example, the first line 
shown in Rg. 5 means that the format conditions of the 
keyword "OPENINGTITLE" are that a character "Q" is 
at the three-space position from the line head, an 
optional length of string follows, and the line ends at a 
string "LAW" or "REGULATION". The fourth line means 
that the format conditions of the keyword "PROMULGA- 
TIONDATE" are that a string "SHOWA" or TAISHO" is 
at the optional-space position from the line head, fol- 
lowed by INTEGER "YEAR" -> INTEGER -> 
"MONTH" -> INTEGER -» "DAY" in this order to end the 
line. 

The keyword extraction module 102 shown in Rg. 1 
checks whether there is a string in the electronic docu- 
ment which string matches the format conditions of the 
keyword extraction rule. If there is a matching string, it is 
extracted as the keyword (an example of an extracted 
keyword is shown in Rg. 7). Thereafter, the document is 
separated into keywords and other strings to generate 
the abstract keyword/text mdd^!04 which is an aggre- 
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gation of keywords and other strings. Specifically, if 
there is a string which is not a keyword, between- key-- 
words, it is considered to be a "text" string other than 
keywords, and a keyword/text model such as shown in 
Fig. 8 is configured. Trie keyword/text model shown in 5 
Fig. 8 starts from the keyword "OPENINGTITLE", fol- 
lowed by a keyword "PROMULGATIONDATE" -> a key- 
word "ESTABLISHEDREGULATIONNO." -> a keyword 
TROMULGATIONSTATEMENT" -» a keyword TITLE", 
-» a keyword "ARTICLENO.". Since a string which is not 70 
a keyword is sandwiched between the keyword "ARTI- 
CLE NO." and the next keyword "PARAGRAPH NO.", 
this string is considered as a text. 

There is a case wherein the same string in the 
same region of the document is extracted as a plurality 15 
of keywords- . For example, in the example of the 
extracted keywords shown in Fig. 7, the string 
"OMPREFECTUREFLOODDEFENCESIG- 
NALREGULATION" at the first and second lines are 
extracted as the keyword of the keyword names of 20 
"OPENINGTITLE" and TITLE". In such a case, rt is 
assumed that the keywords are extracted from the 
same region and a plurality of keyword/text models cor- 
responding to each keyword are generated. The key- 
word/text model shown in Fig 8 is formed by selecting 25 
the "OPENING TITLE" from the region conflicting key- 
word names "OPENINGTITLE" and TITLE". Of the plu- 
rality of keyword/text models, the model which the 
parsing module 105 fails to parse, is determined as an 
improper keyword/text model. If there is a plurality of 30 
keyword/text models which succeeded the parsing, an 
optimum one is selected in accordance with a criterion 
such as the number of extracted keywords so that a sin- 
gle SGML document is eventually generated from the 
optimum keyword/text model. 35 

The parsing module 1 05 shown in Fig. 1 performs a 
parsing process for the keyword/text model 104 in 
accordance with the parsing rule 111. First, the proc- 
esses of modifying DTD 106 by the DTD correcting 
module 1 07 and generating the parsing rule 1 1 1 will be ao 
described with reference to Fig. 9. 

First, the DTD correcting module 107 manually 
generates a modified DTD 108 by modifying the 
description contents of DTD 106 set for the non-struc- 
tured document so as to match the description for mat of 45 
the non-structured document and stores the difference 
as the DTD difference data 109. The reason why such 
correction becomes necessary is that there may be a 
contradiction of the description items and order 
between the non-structured document 101 and DTD so 
106 used for this system. For example, although DTD 
106 shown in Fig. 3 is prepared for the non-structured 
document 101 shown in Fig. 2, the element for the 
opening title "O^A PREFECTURE FLOOD DEFENCE 
SIGNAL REGULATION" at the first line shown in Fig. 2 55 
is not given in DTD 106 shown in Fig. 3. In DTD 106 
shown in Fig. 3, elements are disposed in the order of 
"PROMULGATIONSTATEMENT -» PROMULGATION- 
DATE -> ESTABLISHEDREGULATIONNO. TITLE", 



12 

whereas in the non-structured document shown in Fig. 
2, the elements are disposed in the order of "PROMUL- 
GATIONDATE -> ESTABLISHEDREGULATIONNO. -> 
PROMULGATIONSTATEMENT -> TITLE". 

In order to eliminate such contradiction, the modi- 
fied DTD 108 shown in Fig. 10 is manually generated. 
The meshed portion in Fig. 10 shows the modified ele- 
ments. In order to explicitly indicate the modified por- 
tion, this portion is included by an element (CHANGE 1 
The modified portion of the original DTD 106 is stored 
as the DTD difference data 109 such as shown in Fig. 
11. Also in this case, the modified portion is included by 
the element (CHANGE >. 

If there is no contradiction of the document struc- 
ture between the non-structured document and DTD 
106, it is not necessary to generate the modified DTD 
108 and DTD difference data 109. 

After DTD 106 is modified where necessary, the 
parsing rule generating module 110 executes a rule 
conversion process 906 in accordance with the rule 
conversion regulation 112 shown in Fig. 12 to convert 
the element definition described in the modified DTD 
108 into an interim yacc rule 908. Each rule for an 
interim (hereinafter called a "production rule,") is consti- 
tuted by right and left sides partitioned by a colon 
such as "A : B C;". If there is a pattern described at the 
right side is present the rule is satisfied and the ele- 
ment at the left side is configured. In this example of the 
production rule of "A : B C;" P an element A is generated 
if a pattern "B C" is present 

In DTD, the production rule having the right side of 
"#PCDATA" means that the left side element corre- 
sponds directly to the string of the document structure 
analysis result. In converting the production rule into the 
interim yacc rule, if the (eft side element is an element 
extracted as a keyword in accordancewith the keyword 
extraction rule shown in Fig. 5, theo^PCDATA is con- 
verted into [#KEY "(KEYWORDNAM6)*]. #PCDATA in 
the other production rule is converted into "#TEXT 
meaning a string other than the keyword. For example, 
the production rule converted into [OPENINGTITLE : 
#KEY "OPENINGTITLE"] indicates that the keyword 
"OPENINGTITLE" corresponds to the element "OPEN- 
INGTITLE". The production rule converted into [ARTI- 
CLESTATEMENT : #TEXT] indicates that a string other 
than the keyword corresponds to the dement "ARTI- 
CLESTATEMENT. 

Fig. 13 shows an example of the yacc rule con- 
vet ed from the modified DTD shown in Fig. 10. For 
example, the definition at the fifth line shown in Fig. 10 
is converted into the product rules at the fourth and fifth 
lines shown in Fig. 13. In this case, the "PROMULGA- 
TIONSTATEMENT ?" shown in Fig. 10 is converted into 
"optO" at the fourth line shown in Fig. 13 in accordance 
with the second bottom line rule shown in Fig. 12. The 
definition of "optO" is described at the fifth line of Fig. 13. 

If such an interim yacc rule is used, the parsing 
module generated by yacc outputs only a success/fail- 
ure of parsing and does not output the correspondence 
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between the keyword/text model and elements. How- 
ever, in order to generate the structured-document by 
using the results of parsing, it becomes necessary, 
when each element analysis succeeds, i.e., when each 
interim rule is satisfied, to add, to the keyword/text 5 
model, information (hereinafter called lag information") 
indicating which element corresponds to each constitu- 
ent of the keyword/text model. To this end, the parsing 
rule generating module 1 1 0 executes a C language pro- 
gram embedding process 909 for the interim yacc rule 10 
908 in order to add the tag information to the key- 
word/text model and generate the parsing rule 111. An 
example of the parsing rule 91 0 is shown in Fig. 1 4. The 
meshed portions illustrate the process of the embedded 
C language programs. In this process, pieces of tag 75 
information corresponding to the right side elements of 
the production rule are coupled and the tag information 
corresponding to the left side elements of the produc- 
tion rule is generated. 

Referring back to Fig. 1 , yacc 1 1 3 receives the gen- 20 
erated parsing rule 111 and generates a parsing mod- 
ule 105 which performs a parsing process in 
accordance with the parsing rule 1 1 1 . Manual operation 
required during the process of generating the parsing 
module 105 from DTD 106 is only the operation of 25 
changing the document structure definition so as to 
match the description formal of the non-structured doc- 
ument and generating the DTD difference data 1 09. The 
other operations are automatically performed. 

The parsing module 105 analyzes the document 30 
structure for the keyword/text model 104 to verify 
whether the keyword/text model 104 matches the pars- 
ing rule 1 1 1 , and adds the tag information representa- 
tive of the document structure detected during this 
process to the keywordAext model 104. The interim 35 
SGML document 114 is generated from the key- 
word/text model added with the tag information. 

Keywords and texts (hereinafter collectively called a 
"token") of the keyword/text model both correspond to 
PCDATA" in DTD of the tree diagram shown in Fig. 4, 40 
i.e., to the string representing the contents of each ele- 
ment. The keyword is a string in one-to-one correspond- 
ence with each element, whereas the text is a string 
having no correspondence with each element yet. The 
parsing process corresponds to generate the tree struc- 45 
ture shown in Fig. 4 from the one-dimensional arrange- 
ment of keywords and texts, i.e., the keyword/text 
model. 

The outline of this process by the parsing module 
105 is illustrated in Fig. 19. The parsing module 105 so 
generated by yacc 113 is constituted by a state transi- 
tion table 2004 and a parser 2003 which performs the 
parsing process while referring to the state transition 
table 2004. Described in the state transition table 2004 
are tokens acceptable in a certain state of parsing, and ss 
information on to which state of parsing is changed 
when a token is accepted. The parser 2003 sequentially 
reads a token starting from the opening token, the 
tokens being a constituent of "the keywordAext model 



2001 (2005). If it is judged in a certain state that the 
input token cannot be accepted, it is judged that parsing- 
failed (2006 -» 2007). Conversely, if acceptable, the 
state of parsing advances one step in accordance with 
the state transition table (2006 -> 2008). In this state, if 
any one of the production rules of the parsing rule 1 1 1 
can be satisfied, the tag information corresponding to 
the production rule is added to the keyword/text model 
2001 (2009 -> 2010 : this process is realized by the 
inserted programs shown in Fig. 14). Specifically, if a 
single token corresponds to a certain element, start-tag 
information and end-tag information representative of 
the name of the element are added to the token as a 
pre-tag and a post-tag. For the elements corresponding 
to a plurality of tokens, the start-tag information and 
end-tag information are added to the start and end 
tokens. The details of adding tag information will be 
later detailed. 

When the last token is input and if the parsing 
changes to the state of "normal termination", H is judged 
that the document structure analysis of the keyword/text 
model has succeeded. 

The process when a production rule is satisfied dur- 
ing the parsing will be detailed with reference to the key- 
wordAext model shown in Fig. 8 and the rule shown in 
Fig. 13. This process realizes the following two func- 
tions. 

(1) To what element a keyword or text corresponds 
is determined. For example, *rf the keyword "ARTI- 
CLENO." at the sixth line of the keyword/text model 
shown in Fig. 8 is input the production rule at the 
thirteenth line of Fig. 13 is satisfied (which produc- 
tion rule is satisfied in a certain state is described in 
the state transition table 2004), and the keyword 
"ARTICLE NO." corresponds to the element "ARTI- 
CLENO.". In this case, the start-taginformation and 
end-tag information of the "ARtTCLENO." are 
added to the pre-tag and post-tag of the keyword 
"ARTICLENO." of the keywordAext model (seven- 
teenth and eighteenth lines in Fig. 20). Next, when 
the text at the seventh line of Fig. 8 is input, the pro- 
duction rule at the fourteenth line of Fig. 13 is satis- 
fied so that this text is considered to correspond to 
the element "ARTICLESTATEMENT". The start-tag 
information and end-tag information of the "ARTI- 
CLESTATEMENT" are added to the pre-tag and 
post-tag of the TEXT (twenty first and twenty sec- 
ond lines in Fig. 20). 

(2) Adjacent elements are summarized to a more 
abstract element 

For example, in Fig. 4, the adjacent elements "PAR- 
AGRAPHNO." and "PARAGRAPHSTATEMENT" are 
summarized to a more abstract "PARAGRAPH". In the 
example of the keywordAext model shown in Fig. 8, the 
adjacent "PARAGRAPHNO." and the text (correspond- 
ing to "PARAGRAPHSTATEMENT) at the eighth, and 
ninth lines are summarized to the one element "PAR A- 
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GRAPH" in accordance with the production rule at the 
sixteenth line of Fig. 13. If this production rule-is satis-- 
tied, the start-tag information of "PARAGRAPH" is 
added to the keyword "PARAGRAPHNO." at the eighth 
line of Fig. 8, and the end-tag information is added to 
the text at the ninth line (twenty fourth and twenty eighth 
lines in Fig. 20). The same operation is performed for 
the combinations of tenth and eleventh lines, twelfth and 
thirteenth lines, and fourteenth and fifteenth lines in Fig. 
8. 

The adjacent "ARTICLENO." (sixth line) and "ARTI- 
CLESTATEMENT (seventh line) and a plurality of 
"PARAGRAPHS" (eighth to fifteenth lines) can be sum- 
marized to the element "ARTICLE" in accordance with 
the production rules at the twelfth and fifteenth Ones in 
Fig. 13. In this case, the start-tag information of "ARTI- 
CLE" is added to the pre-tag of the keyword "ARTI- 
CLENO." at the sixth line, and the end-tag information is 
added to the post-tag of the text at the fifteenth line (in 
Fig. 20, only the addition of the start-tag information of 
"ARTICLE" is illustrated at the seventeenth line). 

If the elements are summarized whose constituents 
are keywords representing a number such as "ARTI- 
CLE" and "PARAGRAPH" (in this case, "ARTICLENO." 
and "PARAGRAPHNQ"), the first number and the con- 
tinuity between numbers are checked. Namely, it is 
checked whether the number begins with "1 " and there- 
after the numbers 1, 2, 3,... are continuous. 

The above process is sequentially performed for an 
input token of the keyword/text model 104. If the tree 
structure shown in Fig. 4 having one root (in the exam- 
ple shown in Fig. 4, "LAW") can be obtained, it is judged 
that the keyword/text model 104 matches the parsing 
rule 1 1 1 and the parsing has succeeded. Conversely, if 
a token input in a certain state during the parsing is not 
acceptable, i.e., if the keyword/text model 104 does not 
match the parsing rule 1 1 1, it is judged that the parsing 
has failed. If in the continuity check of numbers of the 
function (2) described above, the first number is abnor- 
mal or the continuity between numbers is not retained, it 
is judged that the document structure analysis has 
failed. For example, such cases corresponding to the 
number 3 instead of starting from the number 1 or the 
numbers are skipped as in 1, 2, and 5. 

If the parsing has succeeded, the parsing module 
105 outputs the interim SGML document 1 14 in accord- 
ance with the tag information given to the keyword/text 
model 104. Specifically, the output interim SGML docu- 
ment 1 14 has tags corresponding to the start-tag infor- 
mation and end-tag information and added to the front 
and back of a string corresponding to each token of the 
keyword/text model 104. An example of the interim 
SGML document 114 is shown in Fig. 15. 

As seen from this example, the tag information 
includes the start-tag information and end-tag informa- 
tion, and the end-tag information is not always posi- 
tioned near the start-tag information. For example, 
although the end-tag information (/ARTICLENO.) for 
the start-tag information (ARTICLENO.) is just two lines 



below, the end-tag information (/ARTICLE ) for the start- 
tag information (ARTICLE) is for below the drawing- 
space Therefore, if the document structure is to be 
manually modif ied when the interim SGML document is 

s generated, it is required to search the corresponding 
start-tag information and end-tag information over the 
whole of the document, requiring a large amount of 
labor. In this embodiment, necessary modification is 
completed at the stage of DTD so that the generated 

w interim SGML document 114 matches the input non- 
structured document 101 and the modification 
deserved above is not necessary. 

If a plurality of keywords are extracted from the 
same region, a plurality of keyword/text models are gen- 
ts erated. In this case, the parsing process is performed 
for all the keyword/text models. If an erroneous keyword 
is contained, the parsing fails. If there are a plurality of 
keyword/text models which have succeeded in the pars- 
ing, an optimum keyword/text model is selected in 

20 accordance with, for example, the condition that there 
are a large number of extracted keywords, and a corre- 
sponding interim SGML document is output. This will be 
descrfoed by using an example shown in Fig. 7 in which 
two keywords "OPENINGTITLE" and TITLE" are 

25 extracted from the same string of the non-structured 
document. The keyword/text model generated by 
selecting the TITLE" fails in the parsing because the 
first line in the modified portion of the modified DTD stip- 
ulates that the "OPENINGTITLE" can appear at the top 

30 of the "LAW" but the TITLE" cannot appear at the top of 
the "LAW". Therefore, the interim SGML document for 
the keyword/text model generated by selecting the 
TITLE" is not output On the other hand, the key- 
word/text text model generated by selecting the "OPEN- 

35 INGTITLE" succeeds in the parsing, and the 
corresponding interim SGML document is output as 
shown in Fig. 15. Z~ 

If there is the DTD cfifference cte& 109, the SGML 
document correcting module 115 modifies the interim 

40 SGML document 1 1 4 in accordance with the DTD cfiffer- 
ence data. The contents of a particular process will be 
descrfoed with reference to Fig. 16. The SGML docu- 
ment correcting module 115 generates an instance 
1602 of modified part in DTD which is a partial SGML 

45 document corresponding to the contents described in 
the DTD difference data 109. In this case, a string 
"#PCDATA" representing the contents of the document 
structure is required to be replaced by a corresponding 
string. A change module 1603 for the interim SGML 

50 document replaces the string by another string repre- 
sentative of the contents of the element having the 
same name. For example, the "#PCDATA" sandwiched 
between the two tags (PROM ULGATIONSTATEMEriT ) 
AND (/PROMULGATiONSTATEMENT) in the instance 

55 1602 of modified part in DTD is replaced by a string 
"AAPREFECTUREFLOODDEFENCESIGNALREGUL- 
ATION ISTOBEPROM ULGATEDASI NTHE FOLLOW- 
ING" sandwiched between the same tags, . in the 
changes 1603 in the interim SGMLdocument Similarly, 
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the ^PCDATA" sandwiched between the two tags ( 
PROMULGATIONDATE > and ( /PROMULGATION- 
DATE) is replaced by a string "SHOWA 24, OCTOBER, 
6", and the ^PCDATA" sandwiched between the two 
tags < ESTABLISHED REGULATION NO.) and < 
/ESTABLISHED REGULATIONNO. ) is replaced by a 
string "AAPREFECTUREREGULATIONNO.78". As in 
the case of the "#PCDATA" sandwiched between the 
two tags (OFFICIALTITLE) and </OFFICIALTITLE> in 
the instance 1602 of modified part in DTD, whose ele- 
ment having the same name is not included in the 
changes 1603 in the interim SGML document, a string 
"NONE" is forcibly inserted. 

The instance 1602 of modified part in DTD gener- 
ated by the replacement process is replaced by the 
modified portion of the interim SGML document 1 14 of 
Fig. 1, i.e.. in the example shown in Fig. 15, the portion 
sandwiched between the two tags (CHANGE) AND 
(/CHANGE). In this manner, the SGML document 
matching DTD 106 preset for subject documents can be 
generated. An example of the SGML document 1 16 is 
shown in Fig. 17. Since the individual document struc- 
ture is directly reflected upon the SGML document, it is 
not necessary as in the conventional case to convert the 
document instance into the individual document struc- 
ture. 

Programs realizing the first embodiment may be 
stored in a storage device such as a hard disk, a floppy 
disk, and an optical disk. 

According to the first embodiment described above, 
the parsing rule 111 used for the document structure 
analysis is directly generated from the document struc- 
ture definition set for subject documents. It is therefore 
possible to reduce labor required for the generation of a 
rule. Since the document instance is generated through 
parsing in accordance with the document structure 
described in the document structure definition of each 
document, it is not necessary to convert the document 
instance obtained through parsing, from the format 
matching the common document structure into the for- 
mat matching the individual document structure. 

Next, the second embodiment will be described. 
This embodiment pertains to a method of supporting to 
generate the keyword extraction rule 103 by using the 
modified DTD and a given layout information. 

Similar to the first embodiment, also in this second 
embodiment, an SGML format is adopted as an exam- 
ple of the structured document format, and as the docu- 
ment structure definition, a DTD is used which is a 
document type definition for SGML set for subject docu- 
ments. 

Fig. 38 is a diagram showing the hardware struc- 
ture of a keyword extraction rule generating system of 
the second embodiment. An input/display device 3910 
receives an input entered by a user and displays an 
information about layout, a generated keyword extrac- 
tion rule, or the like. The input/display device 3910 is 
constituted by a display, a keyboard, a mouse, or the 
like. An external repository unit^920 stores a variety of 



data for keyword extraction rule generation. This unit 
3920 is realized by a hard disk or the like and consti - 
tuted by a modified DTD repository 3921 , a layout defi- 
nition repository 3922, a string-corresponding element 

5 information repository 3923, a layout information repos- 
itory 3924, and a keyword extraction rule repository 
3925. A control unit 3930 controls each device constitut- 
ing the system, processes information for keyword 
extraction generation, and is constituted by a controller 

70 3931 , an internal memory 3932, and a keyword extrac- 
tion rule generating module 3933. The controller 3931 
reads data stored in the modified DTD repository 3391 
and layout definition repository 3922, develops it on the 
internal memory 3932, executes processes of the key- 

15 word extraction rule generating module 3933 on the 
internal memory 3932 by using the developed data, and 
stores the generated string-corresponding element 
information and layout information respectively in the 
string-corresponding element information repository 

20 3923 and layout information repository 3924. The proc- 
esses to be executed include a process 3934 of extract- 
ing document structure information and a process 3935 
of extracting layout information. A process 3936 of gen- 
erating a keyword extraction rule notifies an operator via 

25 the input/display device 3910 of the string-correspond- 
ing element information stored in the string-correspond- 
ing element information repository 3923 and the layout 
information stored in the layout information repository 
3924, and receives if necessary supplementary infor- 

30 mation from the operator via the inputfdisplay device 
3910. The process 3934 of extracting document struc- 
ture information, the process 3935 of extracting layout 
information, and the process 3936 of generating a key- 
word extraction rule can be described by known pro- 
as gramming languages. 

Next, the outline of processes of the second 
embodiment will be described. Z*? 

Fig. 21 is a block diagram showing aftow of the key- 
word extraction rule generating system. Reference 

40 numeral 2201 represents a modified DTD (same as 
DTD 108 shown in Fig. 1) obtained by modifying the 
document structure definition set for subject documents 
so as to match an input non-structured document. The 
modified DTD 2201 defines elements of the non-struc- 

45 tured document and the relationship between elements. 
A document structure information extracting module 
2202 refers to the modified DTD 2201 and generates 
string-corresponding element information 2203 describ- 
ing elements in cfirect correspondence with a string 

so (hereinafter called a "string-corresponding element") 
and a contiguity relationship between elements. 

Reference numeral 2204 represents a layout defini- 
tion set for subject documents which defines with what 
layout each element is output. A layout information 

55 extracting module 2205 refers to the layout definition 
2204 and extracts items necessary for generating a key- 
word extraction rule as many as possible from the layout 
used for outputting each element and from the informa- 
tion of an output string. Each item itself is-hereinafter 



10 



19 



EP0 768 612 A2 



20 



called a "required item", and the information extracted 
for each item is called a "required item content". Layout 
information 2206 describes the required item content for 
each string-corresponding element 

A keyword extraction rule generating module 2207 5 
informs via an input/display device 221 1 an operator of 
the required item content for each string-corresponding 
element in the layout information 2206. This module 
2207 receives information entered by the operator, 
modifies the required item content, and generates a 
keyword extraction rule 2212 in accordance with the 
modified required item content. 

The process by the keyword extraction rule gener- 
ating module 2207 will be described in more particular. 
A keyword information indicator module 2208 informs 
the operator of the name of a string-corresponding ele- 
ment described in the string-corresponding element 
information 2203. If a string-corresponding element is 
set as a keyword-corresponding element and given a 
format condition, this format condition is also displayed 
together with the string-corresponding element 

A supplementary information editing module 2209 
sets the format condition of each string-corresponding 
element. The supplementary information editing mod- 
ule 2209 refers to the layout information 2206 and dis- 
plays the required item content of the string- 
corresponding element selected by the operator. If the 
displayed required item content is different from the lay- 
out and strings of the non-structured document, the 
operator corrects it The content of the required item is 
given by the operator if it cannot be extracted by the lay- 
out information extracting module 1 105. In this manner, 
all the required item contents are edited so that they 
match the layout and strings of the non-structured doc- 
ument. After all the required items are edited, the sup- 
plementary information editing module 2209 generates 
the format condition used for keyword extraction by 
using the required item contents. By using the layout 
condition as a return argument the process is passed 
to the keyword information incficator module 2208. 

The keyword information indicator module 2208 
sets as the keyword-corresponding element the string- 
corresponding element whose format condition was 
generated by the supplementary editing module 2209, 
and displays the layout condition together with the ele- 
ment name. 

With the above processes, each keyword-corre- 
spondng element is determined. A contiguous element 
checking module 2210 inspects at a certain timing 
whether an aggregation of keyword-corresponding ele- 
ments satisfies the restriction condition that non-key- 
words should riot be contiguous. The contiguous 
element checking module 2210 refers to the contiguity 
relationship between string-corresponding elements 
described in the string-corresponding element informa- 
tion 2203, and inspects whether string-corresponding 
elements other than the keyword-corresponding ele- 
ments (hereinafter called "non-keyword-corresponding 
elements") are-contiguous. If there is a possfotlity that 



two non-keyword-corresponding elements are contigu- 
ous, the operator generates the layout condition of one 
of the two elements and sets it as the keyword-corre- 
sponding element Conversely, if there is no possibility 
that non-keyword-corresponding elements are contigu- 
ous, keyword-corresponding elements are sufficient at 
this timing. At this time, an aggregation of combinations 
of the name of each keyword-corresponding element 
and its format condition is used as the keyword extrac- 
tion rule 2212. 

The outline process of the keyword extraction rule 
generating system has been described above. Next the 
details of each process executed by the system shown 
in Fig. 21 will be described. 

The document structure information extracting 
module 2202 refers to the modified DTD 2201 such as 
shown in Fig. 10, extracts each string-o>rresponding 
element and contiguity possibility information between 
string-corresponding elements, and outputs them as the 
string-corresponding element information 2203. 

The string-corresponding element is an element 
having "#PCDATA" representative of a string of the doc- 
ument type definition (modified DTD) as a constituent of 
the model group. Fig. 22 shows the string-correspond- 
ing elements of the modified DTD shown in Rg. 10. In 
the example shown in Rg. 22, extracted as the string- 
corresponding elements are the elements "OPEN- 
INGTITLE", "PROMULGATIONDATE", "ESTABLISHE- 
DREGULATIONNO.", "PROMULGATIONSTATE- 
MENT\ "TITLE", "ARTICLENO.", "ARTICLESTATE- 
MENT\ "PARAGRAPHNO.", and "PARAGRAPH- 
STATEMENT. 

The document structure information extracting 
module 2202 checks a possibility of contiguous string- 
corresponding elements. The following two specific 
processes are performed. 

(1) An aggregation of string-corresporKJing ele- 
ments at the start and end of each element is 
obtained. For example, in the structured document 
shown in Rg. 15, at the start of the element 
"PROMULGATION" (1501 to 1506), the string-cor- 
responding element "PROMULGATIONDATE" 
(1502 to 1503) appears, and at the end of the ele- 
ment "PROMULGATION", the string-corresponding 
element "PROMULGATIONSTATEMENT" (1504 to 
1505) appears. In this process, the elements capa- 
ble of appearing at the start and end of each ele- 
ment are derived from the modified DTD 2201 such 
as shown in Rg. 10. 

(2) A combination of contiguous elements in the 
model group of the modified DTD is obtained. 
There is a contiguity possibility of each combination 
between the string-corresponding elements capa- 
ble of appearing at the end of the preceding ele- 
ment and at the start of the succeeding element 

In this embodiment in order to facilitate the execu- 
tion of these two processes r the modified DTD such as 
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shown in Rg. 10 is converted to have notation of BNF 
(Buckus Naur F6rm)rThts conversion procedure con- 
forms with the rule conversion regulation 112 (Fig. 12) 
and is generally the same as the procedure of convert- 
ing the modified DTD 108 into the interim yacc rule 908. 
However, in this embodiment, which element is deter- 
mined as a keyword is not known. Therefore, the 
description "#PCDATA" of the modified DTD is not con- 
verted into the description of [#KEY "ARTICLENO."] or 
[#TEXT]. Only in this point, this embodiment differs from 
the rule conversion process 906. 

Fig. 23 shows an example of the modified DTD 
expressed by BNF notation. Also in this embodiment a 
rule described in BNF notation and obtained by convert- 
ing the definition of each element of the modified DTD is 
called a "production rule". The right side of each pro- 
duction rule, in this embodiment is called a "content 
model" of the left side element 

The procedure of obtaining from the modified DTD 
expressed by BNF notation an aggregation of string- 
corresponding elements at the start and end of each 
element, will be described. The algorithm of this proce- 
dure is shown in Rg. 24. The procedure starting from A 
in Rg. 24 uses as an input argument an element, and as 
a return argument an aggregation of string-correspond- 
ing elements capable of appearing at the start of the 
element, and contains a recursive call. The variables 
mg and elem used in this procedure are local variables 
newly generated each time the procedure advances to 
A. Rrst[xx] is a global variable representative of an 
aggregation of string-corresponding elements capable 
of appearing at the start of the element xx. 

In order to obtain an aggregation of string-corre- 
sponding elements capable of appearing at the start of 
each element the procedure A is executed by using the 
element as the argument (nt in Rg. 24). 

In the procedure A, Rrst[nt] is set to an empty 
aggregation (2501). Rrstfnt] representing an aggrega- 
tion of string-corresponding elements capable of 
appearing at the start of nt. In the nt content model, of 
the element groups partitioned by an OR-connector "| ", 
the first element group is substituted into the variable 
mg (2502). If the OR-connector does not exist the 
whole of the content model is substituted into the varia- 
ble mg. The first element of mg is substituted into the 
variable elem (2503). Next it is checked whether elem 
is a string-corresponding element (2504). H elem is a 
string-corresponding element elem is added to Rrst[nt] 
(2505) and the flow advances to step 2509, whereas if 
not the content of First[elem] is added to Rrst[nt] (2508) 
if Rrst[elem] has been set (2506) and the flow advances 
to step 2509. If Rrst[elem] is not set at step 2506, elem 
is used as the argument and the procedure A is recur- 
sively executed (2507). The return argument, i.e., the 
content of Rrsttelem] is added to Rrst[nt] and the flow 
advances to step 2509. 

At step 2509, it is checked from the content model 
of nt whether mg is the last element group partitioned by 
the OR-connector. If not the next element group is sub- 



stituted into the variable mg (2510) and the flow returns 
to step 2503. If mg is the last-element group, by using 
First[nt] as the return argument the processing is 
passed to the procedure which called this procedure A 
s (2511). 

The procedure shown in Rg. 24 is performed until 
Rrst[nt] is set for all elements. In this manner, an aggre- 
gation of string-corresponding elements capable of 
appearing at the start of each element can be obtained. 
10 In order to obtain an aggregation Lastfl of string-corre- 
sponding elements capable of appearing at the end of 
each element can be obtained in the similar manner as 
the procedure shown in Rg. 24 by replacing the factors 
shown in Rg. 24 by the following two factors. 

15 

(a) Rrst[xxxJ in Rg. 24 is replaced by Last[xxx]. 

(b) The first element at step 2503 is replaced by the 
last element. 

20 Rg. 25 shows Rrstfl and LastQ of the aggregations 
of string-corresponding elements capable of appearing 
at the start and end of each element of the modified 
DTD shown in Rg. 10. 

With the above procedures, it becomes possible to 

25 obtain the aggregation Rrstfl of string-corresponding 
elements capable of appearing at the start of each ele- 
ment and the aggregation Lastfl of string-corresponding 
elements capable of appearing at the end of each ele- 
ment 

30 Next obtained is a combination of contiguous ele- 
ments in the content model of the document structure 
definition. There is a contiguity possibility of each com- 
bination between component of Lastfl of a preceding 
element and a component of Rrstfl of a succeeding ele- 

35 ment An example of this process is illustrated in Rg. 26 
in which the production rule "CHANGErOPENINGTI- 
TLEP ROMULGATKDNTITLE" 2402 shown in Rg. 23 is 
processed. In this production rule of tfie'content model 
of the element "LAW", the elements "OPENINGTITLE" 

40 and "PROMULGATION" are contiguous and the ele- 
ments "PROMULGATION" and "TITLE" are contiguous 
(2701). Therefore, the element in Rrst[PROMULGA- 
TION] can be backward contiguous with the element in 
Last[OPENINGTfTLEJ (2702). Namely, the string-corre- 

45 sponding element "P ROMULGATION DATE" can be 
backward contiguous with the string-corresponding ele- 
ment "OPENINGTITLE" (2704). The element in 
FirstpTTLE] can be backward contiguous with the ele- 
ment in LasttPROMULGATION] (2703). Namely, the 

50 string-corresponding element "TITLE" can be backward 
contiguous with both the string-corresponding elements 
"PROMULGATIONSTATEMENT" and "ESTABLISH E- 
DREGULATIONNO." (2705). This process is applied to 
all production rules in the document structure def inition 

55 expressed in BNF notation. Therefore, an aggregation 
of all string-corresponding elements capable of being 
backward contiguous can be obtained, and this aggre- 
gation is the string-corresponding element information 
(2203 in Rg. 21). An example of the string-conrespond- 
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ing element information 2203 is shown in Fig. 27. 

With the procedure d^eribed with the drawings up 
to Fig. 26, the document structure information extracting 
module 2202 can generate the string-corresponding 
element information 2203. 

Next, the process of the layout information extract- 
ing module 2205 shown in Fig. 21 for extracting the lay- 
out information 2206 from the layout definition 2204 will 
be described. 

The layout definition 2204 is set for subject docu- 
ments and defines with what layout each element is out- 
put Fig. 1 8 shows an example of the layout definition in 
part prepared for structured documents conforming with 
the document type definition (DTD). Reference numeral 
2901 indicates that reference numerals 2901 to 2911 
represent the, layout definitions of the element "TITLE". 
A [font name] 2902 indicates that the font name used for 
outputting TITLE" is Gothic, and a [font size] 2903 indi- 
cates that the font size is 12 pt (point) which is a length 
unit and 1 pt = 1/72 inch. A [character pitch] indicates 
that the character pitch of TITLE" is 1 4 pt An [offset 1 ] 
2905 and an [offset 2] 2906 indicate what minimum 
spaces from the right and left sides of a region where a 
document is output are reserved for outputting the con- 
tent of TITLE". A [first-line displacement] 2907 indi- 
cates a difference from the [offset 1] of an offset of the 
first line which often takes a different offset from other 
lines. A [connection with previous element] 2908 indi- 
cates which string is output after an element just before. 
In this example, after an element just before is output 
the TITLE" is output on a new line after line feed. A 
[string information] 2909 descrtoes which string is out- 
put In this example, a string CONTENT corresponding 
to the TITLE", i.e., the string between the tag (TITLE ) 
and tag (/TITLE ), is output. A [placement) 2910 indi- 
cates how strings are placed between the area defined 
by the [offset 1] and [offset 2]. This [placement] 2910 
takes four values "start", "end", "center", and "justify" 
corresponding to the left alignment right alignment 
centering, and equal space. In this example, the string 
of TITLE" is output through centering. 

Such layout definitions are essentially used for out- 
putting a structured document and are not used for 
expressing the layout of a non-structured document 
However, for a document having a regular layout such 
as legal documents, the layout definition is often deter- 
mined in accordance with the layout regularity. Most of 
pieces of information of layout and string in the layout 
definition of such a document can be used for extracting 
keywords from the non-structured document. 

The layout information extracting module 2205 
refers to the layout definition 2204 and extracts items 
necessary for extracting a keyword as many as possible 
from the information of layout and string used for output- 
ting each element As described earlier, this Hern itself is 
called a "required item", and the information extracted 
fa each item is called a "required item content". 

Fig. 29 shows an example of required items for 
each keyword when the keyword rule shown in Fig. 5 is 



generated. An [element name] 3001 is the name of a 
subject string-corresponding elementand takes a value 
of a string. A peft-hand space] 3002 and a [right-hand 
space] 3003 indicate the conditions of what minimum 

5 character spaces from the right and left sides of a region 
where a document is output are reserved for outputting 
the string of the element. A [First-line indent] 3004 indi- 
cates what character spaces at the left side are 
reserved at the first line which often takes a different off- 

ro set from other lines. A [string condition] 3005 indicates 
what string describes the keyword. An [arrangement] 
3006 indicates how keywords are arranged in the region 
defined by the [left-hand space] 3002 and [right-hand 
space] 3003. This [arrangement] 3006 takes four values 

is "right justify", left justify", "centering" and "equal 
space". A [previous string] 3007 and a [next string] 3008 
indicate strings which show what strings are sand- 
wiched between string-corresponding elements appear- 
ing before and after the subject keyword. 

20 The layout information extracting module 2205 
refers to the layout definition 2204 and extracts informa- 
tion of the required items shown in Fig. 29, i.e., the 
required item contents, as much as possible. Fig. 30 
illustrates an example of a process of extracting the 

25 required item contents from the layout definition shown 
in Fig. 28. 

In order to extract the required item content of a 
string-corresponding element, the definition of the 
string-corresponding element in the layout definition is 

30 used. For example, the required item for the "ARTI- 
CLE NO." is extracted from the definitions 2912 to 2922 
of the "ARTICLENO." shown in Fig. 28. 

The required items [left-hand space] and [right- 
hand space] are the items indicating the same contents 

35 of the [offset 1] and [offset 2] of the layout definition. 
Therefore, only the unit of length is changed from pt to 
the number of characters. Specifically, the values of the 
[offset 1] and [offset 2] are divided iBy the value of the 
[character pitch] (3101 and 3102). : The required item 

40 first-line indent] has the content of the sum of the [offset 
1] in the layout definition and [first-line displacement] 
divided by the [character pitch] (3103]. The content of 
the required item [string condition] is generated by refer- 
ring to the [string information] in the layout definition 

45 (3104). However, in the example shown in Fig. 28, the 
[string information] is "CONTENT" for all elements so 
that the string in the document instance itself is output 
and specific information of a string cannot be obtained 
from the layout definition. Since the required item 

so [arrangement] is the item representing the same con- 
cept as the [placement] in the layout definition so that 
the values are converted in accordance with the rules 
3105. Into the content of the required item [previous 
string], the content of the [connection with previous ele- 

55 ment] is substituted (3106). 

The content of the required item [next string] is 
obtained by using the string-corresponding element 
information and the [connection with previous element] 
of other elements in the layout definition (3107). Specif- 
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ically, first a string-corresponding element (hereinafter 
called a "next elemenf)backward contiguous with the 
subject string-corresponding element is obtained by 
using the string-corresponding element information. 
Next, the [connection with previous element] is checked 5 
for all next elements, and if the contents of all next ele- 
ments are the same, this content is set as the content of 
the [next string] of the [next string]. If there is a next 
string having the different content of the [connection 
with previous element], the content of the [next string] is 10 
not set. For example, from the string-corresponding ele- 
ment information shown in Fig. 27 at 2806, it can be 
known that the next string of "ARTICLENO." is only 
"ARTICLESTATEMENT". The content of the [next 
string] of "ARTICLENO." is " "of the [connection with 15 
previous element of "ARTICLESTATEMENTJ. 

The above processes are executed for all string- 
corresponding elements to generate the layout informa- 
tion 2206 shown in Fig. 21. 

The keyword extraction rule generating module 20 

2207 shown in Fig. 21 informs via the input/output 
device 221 1 an operator of the string-corresponding 
element information 2203 and layout information 2206. 
This module 2207 receives supplementary information 
from the operator to add and modify the required item 25 
content and generate the keyword extraction rule 2212. 

A specific process of the keyword extraction rule gener- 
ating module 2207 will be described. 

The keyword information indicator module 2208 
informs the operator of the string-corresponding ele- so 
ment name and which string-corresponcfing element is 
set as the keyword-corresponding element at a certain 
timing. If the operator instructs to set a particular string- 
corresponding element to the keyword-corresponding 
element, the keyword information indicator module 35 

2208 activates the supplementary information input 
module 2209 which supplements the required item con- 
tent of the string-corresponding element If the operator 
instructs to inspect whether set keyword-corresponding 
elements satisfy at that timing the restriction condition 40 
that non-keywords should not be contiguous, the contig- 
uous element checking module 2210 is activated. 

Fig. 31 shows an example of an interface for the 
keyword information indicator module 2208 to display 
information on the input/display device 2211 for the 45 
operator, and Fig. 32 is its process flow. The operation 
of the keyword information indicator module 2208 will be 
described with reference to Figs. 31 and 32. Upon acti- 
vation, the keyword information indicator module 2208 
reads the string-corresponding element information so 
2203 and obtains the name of each string-correspond- 
ing element (3301). Reference numeral 3202 repre- 
sents a keyword information window which is 
constituted by an element name display area 3202 for 
displaying the names of all string-corresponding ele- 55 
ments and a format condition display area 3203 for dis- 
playing the format condition of for the string- 
corresponding element set as the keyword-correspond- 
ing element. At step 3202, the string-corresponding ele- 



ment name and the layout condition of an element set 
as the keyword-corresponding^elemerrt at this timing are 
displayed. In this case, at the initial stage, the format 
condition is not set to any element so that the format 
condition display area 3202 displays no information. In 
order to give the format condition to a string-corre- 
sponding element and set this element as the keyword- 
corresponding element, the operator first double clicks 
the element name in the element name display area 
3202 with a mouse to thereby activate the supplemen- 
tary information editing module (2209 in Fig. 21) (3304). 
The detailed operation of the supplementary informa- 
tion editing module 2209 will be given later. The string- 
corresponding element name is passed to the supple- 
mentary information editing module 2209, and its format 
condition is received as the return argument The string- 
corresponding element designated by the operator is 
set as the keyword-corresponding element (3305) and 
fts format condition is displayed in the format condition 
display area 3203 (3302). In the example shown in Fig. 
31, a display at the interface at a certain timing is 
shown. At this timing, the format conditions are given to 
the two string-corresponding element of the TITLE" 
3206 and "PARAGRAPHNO." 3207, which means that 
the two string-corresponding elements are set as the 
keyword-corresponding elements. 

Reference numeral 3204 represents a button for 
checking contiguous elements. As this button 3204 is 
clicked, the contiguous element checking module (2210 
in Fig. 21) is activated which inspects whether an aggre- 
gation of keyword-corresponding elements set at this 
timing satisfy the restriction condition that non-key- 
words should not be contiguous (3306). The operation 
of the contiguous element checking module 221 0 will be 
later described. If the inspection judges that the key- 
word^corresponding elements satisfying the restriction 
condition are set, the operator clicks an exit button to 
instruct to terminate the process of the"keyword infor- 
mation indicating module 2208. The keyword informa- 
tion incficator module 2208 outputs the keyword- 
corresponding element name and its format condition 
as the keyword extraction rule (2212 in Fig. 21) and ter- 
minates the process (3307). The contents of the proc- 
esses by the keyword information indicator module 
2208 have been described above. 

Fig. 33 shows an example of an interface of the 
supplementary information editing module 2209 acti- 
vated when the element name is double clicked during 
the operation of the keyword information indicator mod- 
ule 2208, and Fig. 34 shows the process flow. The sup- 
plementary information editing module 2209 reads the 
name of the string-corresponding element set as the 
keyword-corresponding element whose layout condition 
is to be set, the name being passed from the keyword 
information indicator module 2208 (3501), and reads 
the required item content of the element from the layout 
information (2206 in Fig. 21 (3502). The required item 
content is displayed on a required item editor 3.401 
(3503). The required item editor 3401 consists of win- 
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dows in which the display content can be edited. If the 
display contents different from the description format of 
the non-structured document the operator changes its 
content. Since the required item content (e.g., [string 
condition] in the extraction example shown in Figs. 30 s 
and 31) which cannot be extracted by the layout infor- 
mation extracting module 2205 is not displayed on the 
required item editor, the operator enters the required 
item content to the required item editor (3504 to 3503). 
An example after the [string condition] is entered is w 
shown in Fig. 30 under the title of "after entering string 
condition". 

After the required item contents are edited and all 
the required item contents match the description format 
of the non-structured document the operator clicks an is 
exit button 3492 to instruct the termination of the proc- 
esses of the supplementary information editing module 
2209. The supplementary information editing module 
2209 generates the format conditions from the edited 
required item contents of the string-corresponding ele- 20 
merits set as the keyword-corresponding elements 
(3506), and passes the format conditions as the return 
argument to the keyword information indicator module 
2208 (3507). The process flow of generating the format 
condition from the required item content is shown in Fig. 25 
35. This process flow is added with an example of steps 
surrounded by a broken line in Fig. 35 which step con- 
verts the required item content of "ARTICLENO." shown 
under the title of "after entering string condition" into the 
format condition. 30 

First the content (e.g., "ART1CLE"NUM1) of the 
required item [string condition] is substituted into the for- 
mat condition, and it is checked whether the content of 
the required item [previous string] is line feed (3601). If 
line feed, the flow advances to step 3603, whereas if 35 
not the format condition is surrounded by T and "]" and 
"+" and the content of the [previous string] are added 
just before it (3602). In this case, a blank is converted 
into SPC [integer]. Next at step 3603 it is checked 
whether the content of the required item [next string] is 40 
line feed. If line feed, T is added to the end of the for- 
mat condition (3605) and the flew advances to step 
3606, whereas if not the format condition is surrounded 
by T and 1" if the format condition does not contain T 
and T and the content of the [next string] and V are 45 
added just after it (3604, e.g., ["AKTICLE"NUM1 SPC1 
+]). At step 3606 it is checked whether the content of the 
required Hem [arrangement] is "centering" or not If 
"centering", "C" is added to the start of the format con- 
dition (3607) and the generation of the format condition so 
is terminated. If not "centering", the flow advances to 
step 3608 and the process A or B is executed depend- 
ing upon the content of the [arrangement]. If the content 
of the [arrangement] is "left justify", the process A is per- 
formed, if "right justify", the process B is performed, and ss 
if "equal space", both the processes A and B are per- 
formed, to thereafter terminate the generation of the for- 
mat condition. In the process A, " A SPCx" is added to the 
start of the format condition (3609) where x is the con- 



tent of the first-line indent] (e.g.. A SPC0 fARTI- 
CLE"NUM1] SPC1-*)rln the process B, first "SPCy$" is 
added to the end of the format condition (3610) where y 
is the content of the [right-hand space. Next if " A " or V 
at the start of the format condition, "!" is added to the 
start of the format condition (361 1). 

The supplementary information editing module 
2209 passes the obtained format condition as the return 
argument to the keyword information indicating module 
(3507 in Fig. 34) which in turn executes the process. 
The above description is the contents of the processes 
by the supplementary information indicating module 
2209. 

Fig. 36 shows the process flow of the contiguous 
element checking module 2210 activated when the con- 
tiguity check button is clicked during the operation of the 
keyword information indicating module (2208 in Fig. 21), 
and Fig. 37 shows an example of its processes. The 
contiguous element checking module 2210 first reads 
the keyword-corresponding element given by the key- 
word information indicating module 2208 (3701, e.g., 
3801). Next, it reads the string-corresponding element 
information (2203 in Fig. 21) (3702). Then, non-key- 
word-corresponding elements are obtained as an 
aggregation of all string-corresponcfing elements sub- 
tracted by the keyword-corresponding elements (3703. 
e.g., 3802). At step 3704, by referring to the string-cor- 
responding element information, it is checked whether 
there is a non-keyword corresponding element in the 
next element of another non-keyword-corresponding 
element (e.g., 3803). If there is such a non-keyword cor- 
responding element the operator is informed of the 
contiguous ron-keyword-corresponding element (3705, 
e.g., 3804) to thereafter terminate the process. If there 
is not the operator is informed of such effect (3706) to 
thereafter terminate the process. The above description 
is the process contents of the contiguous element 
checking module 2210. 

With this embodiment the keyword extraction rule 
can be generated. The programs described with this 
embocfiment may be stored in a storage such as a hard 
disk, a floppy disk, an optical disk, and a CD-ROM. 

Claims 

1 . A method of generating a structured document for a 
structured document generating apparatus having 
at least an input/output device (1), a control unit (3), 
and a repository (2) wherein a non-structured doc- 
ument (1 01) not explicitly given the document struc- 
ture and input from said input/output device is 
converted into a structured document (116) explic- 
itly given the document structure, in accordance 
with a document structure definition defining the 
document structure, said method comprising the 
steps of: 

modifying a given first document structure defi- 
nition (1 06) so as to match the document struc- 
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30 



ture of said input non-structured document 
(101) and generate a second document struc- 
ture definition (107); 

said control unit (3) generating a parsing rule 
(1 1 1) used for performing a parsing process 5 
suitable for the document structure of said sec- 
ond document structure definition, by modify- 
ing marks constituting said second document 
structure definition and modifying said second 
document structure definition so as to make the 10 
positional order of said marks in one-to-one 
correspondence; 

in accordance with said generated parsing rule, 
generating a first structured document (114) 
from said input non-structured document; and 15 
in accordance with difference data between 
said first document structure definition and said 
second document structure definition, convert- 
ing said generated first structured document 
into a format matching said first document 20 
structure definition to thereby generate a sec- 
ond structured document (1 16). 

2. A method of generating a structured document 
according to claim 1 , wherein said first and second 2s 
document structure definitions (106, 107) include 
mark trains disposed for defining the relationship 
between character strings constituting a document 

to be input 

30 

3. A method of generating a structured document 
according to claim 2, wherein said parsing rule 
(111) is generated by embedding a process of 
explicitly giving the parsed portion of document 
structure to be parsed, into an interim rule gener- 35 
ated by converting said second document structure 
definition in accordance with a given rule conver- 
sion regulation (112). 

4. A method of generating a structured document 40 
according to claim 2, wherein the mark strings of 
said first and second document structure definitions 
(106, 107) describe the document structure, repre- 
senting a conceptional relationship between the 
character strings of a document to be input by dis- 45 
posing names representing the concept of each 
character string. 

. A method of generating a structured document 
according to claim 2, further comprising the steps so 
of: 

extracting a keyword from said non-structured 
document in accordance with a predetermined 
rule (103) regarding the character strings of a ss 
document to be input and generating a key- 
word/text model (104) including at least charac- 
ter strings extracted as keywords and other 
character strings; and 



converting said keyword/text model into said 
— first structured document (114) by using said 
parsing rule. 

6. A method of generating a structured document 
according to claim 5, wherein if the same character 
string in the same character region is extracted as a 
plurality of keywords, said control unit (3) selects a 
proper one from the plurality of keywords in accord- 
ance with whether the parsing process succeeds or 
fails. 

7. A method of generating a structured document 
according to claim 5, wherein said keyword is 
extracted by analyzing each character string in said 
non-structured document (1 101) with reference to a 
keyword extraction rule (103) having a correspond- 
ence between a format condition of each character 
string and a keyword name. 

8. A method of generating a structured document 
according to claim 7, wherein said keyword extrac- 
tion rule (103) is generated, if a layout definition of 
said non-structured document is given, by modify- 
ing said layout definition in accordance with a pre- 
determined rule. 

9. A structured document generating apparatus hav- 
ing at feast an input/output device (1), a control unit 
(3), and a repository (2) wherein a non-structured 
document (101) not explicitly given the document 
structure is converted into a structured document 
(116) explicitly given the document structure, com- 
prising: 

keyword extracting means (102) for extracting 
as a keyword a character string representative 
of a constituent element of thedocumerrt struc- 
ture of said non-structured- document in 
accordance with layout information about lay- 
out and character string information of said 
non-structured document; 
rule generating means (110) for generating a 
rule from a second document structure defini- 
tion obtained by modifying a given first docu- 
ment definition, said rule being used for 
converting said non-structured document into 
said structured document matching said sec- 
ond document structure; and 
structured document generating means (113, 
105, 1 15) for generating said structured docu- 
ment by using the keyword extracted by said 
keyword extracting means and the rule gener- 
ated by said rule generating means. 

10. A method of extracting a keyword of a particular 
character string representing a constituent element 
of the structure of a document, comprising, the 
steps of: 



16 



31 EP 0 768 612 A2 32 

extracting (2202) document structure irrforma- 

-tion-from a document structure information 

given in advance to a non-structured document 
and generating string-corresponding element 
information, the string-corresponding element s 
which is element of document structure consti- 
tuting each character string of said non-struc- 
tured document; 

generating layout information (2203) from a 
layout definition given to said non-structured 10 
document, the layout definition defining an out- 
put format of said non-structured document; 
and 

extracting the keyword (2208) in accordance 
with a rule made from said string-correspond- is 
ing element information and said layout infor- 
mation. 

11. A method of extracting a keyword according to 
claim 1 0, wherein said step of generating the string- 20 
corresponding element information generates as 
the string-corresponding element information conti- 
guity-relationship between said string-correspond- 
ing element. 

25 

12. A method of extracting a keyword according to 
claim 10, wherein sad step of generating the layout 
information generates as the layout information a 
layout used when the constituent element of the 
document structure is output, and information of 30 
each character string. 
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FIG.15 

<LAW> 

< CHANGE > 

< OPENINGTITLE > 

OAAPREFECTURE FLOOD DEFENCE SIGNAL REGULATION 

< / OPENINGTITLE > 
1501 -^-< PROMULGATION > 
1502-^-< PROMULGATIONDATE > 

SHOWA 24, OCTOBER, 6 
1 503 -^—< / PROMULGATIONDATE > 

< ESTABLISHEDREG ULATIONNO. > 
AAPREFECTURE REGULATION NO. 78 

< / ESTABLISHEDREGULATIONNO. > 
1 504 — ^ — < PROMULGATIONSTATEMENT > 

AAPREFECTURE FLOOD DEFENCE SIGNAL REGULATION IS 
.„ TO BE PROMULGATED AS IN THE FOLLOWING 

1 505 < / PROMULGATIONSTATEMENT > 
1 506-^- < / PROMULGATION > 

< TITLE > 

AAPREFECTURE FLOOD DEFENCE SIGNAL REGULATION 
< /TITLE > 
</ CHANGE > 

< PRESENTREGULATION > 

< ARTICLE > 
<ARTICLENO.> 

ARTICLE 1 ^ 
</ARTICLENO.> r 

< FIRSTPARAGRAPH > 

< FIRSTPARAGRAPHSTATEMENT > 

FLOOD DEFENCE SIGNALS STIPULATED IN ARTICLE 13, 
PARAGRAPH 1 OF THE FLOOD DEFENCE LAW 
(SHOWA 24, JUNE, LAW NO. 193) INCLUDE THE FOLLWING. 

< / FIRSTPARAGRAPHSTATEMENT > 

< PARAGRAPH > 

< PARAGRAPHNO. > 
(1) 

</ PARAGRAPHNO. > 

< PARAGRAPHSTATEMENT > 

FIRST SIGNAL: FOR NOTIFYING AN ALARM WATER LEVEL 

< / PARAGRAPHSTATEMENT > 
</ PARAGRAPH > 
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FIG.17 

<LAW> 

< PROMULGATION > 

< PROMULGATIONSTATEMENT > 

AAPREFECTURE FLOOD DEFENCE SIGNAL REGULATION 
TO BE PROMULGATED AS IN THE FOLLOWING 

< / PROMULGATIONSTATEMENT > 

< PROMULGATIONDATE > 
SHOWA 24, OCTOBER, 6 

< / PROMULGATIONDATE > 

< PROMULGATIONOFFICER > 

< OFFICIALTITLE > 
[NONE J 

</ OFFICIALTITLE > 

< NAME > 
I NONE J 

< / NAME > 

</ PROMULGATIONOFFICER > 
</ PROMULGATION > 

< ESTABLISHEDREGULATIONNO. > 
AAPREFECTURE REGULATION NO. 78 

</ ESTABLISHEDREGULATIONNO. > 

< TITLE > 

AAPREFECTURE FLOOD DEFENCE SIGNAL REGULATION 
</ TITLE > 

< PRESENTREGULATION > 

< ARTICLE > z$ 

< ARTICLENO. > -f 
ARTICLE 1 

</ ARTICLENO. > 

< FIRSTPARAGRAPH > 

< FIRSTPARAGRAPHSTATEMENT > 

FLOOD DEFENCE SIGNALS STIPULATED IN ARTICLE 13, 
PARAGRAPH 1 OF THE FLOOD DEFENCE LAW 
{SHOWA 24, JUNE, LAW NO. 193) INCLUDE THE FOLLWING 

< / FIRSTPARAGRAPHSTATEMENT > 

< PARAGRAPH > 

< PARAGRAPHNO. > 
d) 

</ PARAGRAPHNO. > 

< PARAGRAPHSTATEMENT > 

FIRST SIGNAL : FOR NOTIFYING AN ALARM WATER LEVEL 
< / PARAGRAPHSTATEMENT > 
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FIG.21 



X 



2204 



LAYOUT DEFINITION 



X 



2201 



DOCUMENT STRUCTURE 
DEFINITION 



2205 



UYOUT INFORMATION 
EXTRACTING MODULE 



1 



2202 



DOCUMENT STRUCTURE 

INFORMATION 
EXTRACTING MODULE 



X 



2206 



2203 



LAYOUT INFORMATION 



STRING-CORRESPONDING 
ELEMENT INFORMATION 



SUPPLEMENTARY 

I FORMAT! ON 
EDITING MODULE 



2209 



7 



KEYWORD 
INFORMATION 
INDICATOR MODULE 



CONTIGUOUS 
ELEMENT 
I CHECKING MODULE 



2208 



7 



KEYWORD EXTRACTION RULE 
GENERATING MODULE 
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2207 
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INPUT DISPLAY 





INPUT /DISPLAY 
DEVICE 
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First [nt] = {^} 



ARGUMENT ELEMENT - nl 
>^2501 



FIG.24 



IN nl CONTENT MODEL, 
mg = FIRST ELEMENT GROUP 
PARTITIONED BY 
OR -CONNECTOR 



-2502 



elem = FIRST ELEMENT OF mg 



2503 



2506 




YES 



ADD elem 
TO First [nt] 



2505 



YES 



1 



2507 



TO® USING elem 
AS ARGUMENT 



RETURN 
ARGUMENT 
=First[elem] 



2508 



ADD CONTENT 
OF First [elem} 
TO First [nt) * 



mg IS 
LAST ELEMENT 
GROUP 
? 

'no 



,2509 
YES 



2510 



mg = NEXT ELEMENT 
GROUP PARTITIONED BY 
OR-CONNECTOR 



/^RTCESSING^^ 
/ RETURNS USING N 
V First (nt] AS RETURN J 

X^RGUMENTX 



2511 
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FIG725 

Rrst(UW) 
Rrstf CHANGE] 
First [OPENING TITLE) 
First [ PROMULGATION ) 
First (opto J 

First [PROMULGATION DATE] 
First [ESTABLISHED REGULATION NO.] 
First [ PROMULGATION STATEMENT] 
First] TITLE] 

First [PRESENT REGULATION] 

First [plusO] 

First [ARTICLE J 

First [ARTICLE NO.] 

First [ARTICLE STATEMENT] 

Rrst[repO] 

First [PARAGRAPH] 

FirstJ PARAGRAPH NO.] 

First [ PARAGRAPH STATEMENT] 



={ OPENING TITLE] 

={ OPENING TITLE] 

={ OPENING TITLE) 

= { PROMULGATION DATE) 

= { PROMULGATION STATEMENT } 

= { PROMULGATION DATE) 

= { ESTABLISHED REGULATION NO.) 

= { PROMULGATION STATEMENT) 

= { TITLE) 

={ ARTICLE NO. J 

= { ARTICLE NO.) 

= { ARTICLE NO.) 

={ ARTICLE NO.) 

=[ ARTICLE STATEMENT] 

={ PARAGRAPH NO.) 

={ PARAGRAPH NO.) 

={ PARAGRAPH NO.) 

={ PARAGRAPH STATEMENT) 



First [LAW] 
first [ CHANGE] 
First [OPENING TITLE] 
Frst [ PROMULGATION ] 

FrstfoptO] 

First[ PROMULGATION DATE] 
Frst [ESTABLISHED REGULATION NO.] 
first[ PROMULGATION STATEMENT] 
Frst [TITLE] 

First[ PRESENT REGULATION] 

first [plusO] 

first [ARTICLE] 

First [ARTICLE NO.] 

first [ARTICLE STATEMENT] 

first [repO] 

first [PARAGRAPH] 

first [PARAGRAPH NO.] 

first [ PARAGRAPH STATEMENT] 



=[ ARTICLE NO. ARTICLE STATEMENT] 
=[ TITLE) 

= { OPENING TITLE) 
={ PROMULGATION STATEMENT. -» 
ESTABLISHED REGULATION N&i) 
={ PROMULGATION STATEMENT £ 
={ PROMULGATION DATE) 
= { ESTABLISHED REGULATION NO.) 
= { PROMULGATION STATEMENT ) 
= { TITLE) 

= { ARTICLE STATEMENT, PARAGRAPH 
STATEMENT) 

= { ARTICLE STATEMENT, PARAGRAPH 
STATEMENT) 

= { ARTICLE STATEMENT, PARAGRAPH 
STATEMENT] 

= { ARTICLE NO.) 

={ ARTICLE STATEMENT) 

={ PARAGRAPH STATEMENT) 

={ PARAGRAPH STATEMENT ) 

={ PARAGRAPH NO.) 

= { PARAGRAPH STATEMENT } 
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FIG.28 



2901 — ^— TITLE { 

2902 [ FONT NAME J GOTHIC 

2903 [ FONT SIZE I 12pt 

2904— ^ [CHARACTER PITCH J 14pt 

2905 [ OFFSET 1 ] Opt 

2906 — — [ OFFSET 2 ] Opt 

2907 — * — [FIRST LINE DISPLACEMENT] Opt 

2908— ^- [ CONNECTION WITH PREVIOUS ELEMENT] "¥n" 
2909 ~ [STRING INFORMATION] CONTENT 
2910-^- [ PLACEMENT Jcenter center 
2911 ~} 

2912-^- ARTICLE N0.{ 

291 3 — ^ [ FONT NAME ] GOTHIC 

2914-^- [FONT SIZE] lOpt 

291 5 — - [ CHARACTER PITCH ] 12pt 

2916—^- [OFFSET 1] 12pt 
291 7 ~ [OFFSET 2] Opt 

291 8 — ^ [ RRST UNE DISPLACEMENT] Opt 

2919— ^- [CONNECTION WITH PREVIOUS ELEMENT] "¥n" 

2920- ^- [STRING INFORMATION] CONTENT 
2921^— I PLACEMENT Jcenter start £ 

2922— ^} r 

2923- ^- ARTICLE STATEMENT [ 

2924 -~ [ FONT NAME ] MING 

2925 — - [ FONT SIZE ] 1 0pt 
2926— ^ [CHARACTER PITCH] 12pt 
2927 ~ [ OFFSET 1] 12pt 

2928- ^- [OFFSET 2] Opt 

2929- ^- [RRST LINE DISPLACEMENT] Opt 

2930- ^- [CONNECTION WITH PREVIOUS ELEMENT] " ■ 
2931 [ STRING INFORMATION ] CONTENT 
2932-^ I PLACEMENT] center start 
2933 -~ } 
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FIG.29 



ITEM NAME USABLE SORT OF VALUE 



3001 ~ (ELEMENT NAME] 

3002 -~ (LEFT-HAND SPACE] 

3003 — (RIGHT-HAND SPACE] 

3004 — ^— (FIRST-LINE INDENT] 

3005 I S™G CONDITION ] 

3006 (ARRANGEMENT] 

3007 (PREVIOUS STRING ] 
3008~f NEXT STRINQ 1 



STRING 

INTEGER (UNIT NUMBER OF CHARACTERS) 
INTEGER (UNIT NUMBER OF CHARACTERS) 
INTEGER (UNIT NUMBER OF CHARACTERS) 
STRING 

RIGHT JUSTIFY OR LEFT JUSTIFY 

OR CENTERING OR EQUAL SPACE 
STRING 
STRING 



3101 
3102 
3103 

3104 
3105 



3106 
3107 



REQUIRED ITEM 

(LEFT-HAND SPACE] 
(RIGHT-HAND SPACE] 
[FIRST-LINE INDENT] 

(STRING CONDITION] 



(ARRANGEMENT] 
RIGHT JUSTIFY - 
LEFT JUSTIFY - 
CENTERING - 
EQUAL SPACE - 



(PREVIOUS STRING] 
(NEXT STRING] 



FIG.30 

INFORMATION IN LAYOUT DEFINITION 



(OFFSET 1]/( CHARACTER PITCH] 

( OFFSET 2 ] / ( CHARACTER PITCH ] 

{ (OFFSET 1 ] + [ FIRST-UNE INDENT] }/ 

(CHARACTER PITCH] 

[STRING INFORMATION] - : ; 

[PLACEMENT] 

— start 

— end 

— center 

— justify 



(CONNECTION WITH PREVIOUS ELEMENT] 

CONTENT IS OBTAINED BY 
USING STRING-CORRESPONDING 
ELEMENT INFORMATION AND 
[CONNECTION WITH PREVIOUS ELEMENT] 
(REFER TO SPECIFICATION) 
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FIG.31 



.3202 



,3203 



3201 




3204 



3205 
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FIG.32 



(^ START ^) 



READ STRING-CORRESPONDING Z^3301 
ELEMENT INFORMATION 



• DISPLAY STRING-CORRESPONDING ELEMENT 
•DISPLAY FORMAT CONDITION OF 
KEYWORD-CORRESPONDING ELEMENT 



CONTIGUITY 
CHECK BUTTON 



-3302 



3306 



USER 
ACTION 



,3303 

DUBLE CLICK 
'ELEMENT NAME 

PASS 



ACTIVATE CONTIGUOUS 
ELEMENT CHECKING MODULE 



PASS KEYWORD- 
CORRESPONDING NAME 



CHARACTER- 
CORRESPOND 



ELEMENT NAME 



ACTIVATE SUPPLEMENTARY 
EDITING MODULE 



3304 



NG 



RECEIVE FORMAT 
CONDITION 



SET STRING-CORRESPONDING 
ELEMENT AS KEYWORD- 
CORRESPONDING ELEMENT 



3305 



zr 



OUTPUT KEYWORD-CORRESPONDING 
ELEMENTS AND FORMAT CONDITIONS 
AS KEYWORD EXTRACTION RULE 



3307 




END 



48 



EP0 768 612A2 



FIG.33 



INITIAL STATE ^ 



DOCUMENT STRUCTURE N 
. . ELEMENT NAME E 

Ks\\ssss\ssss\ssssssss\ss^ 



3AR 



i jPRE^C^ "¥n 



3NEXT STRING fi , 
j STRING CONDITION 




^^^^ 



3401 



ARTICLE NO. 



g 

i j LEFT-HAND SPACE E 3 0 
ij RIGHT-HAND SPACE M 0 
ij HRST-UNE INDENT 
ARRANGEMENT 



EXIT 




3402 



Q 



AFTER ENTERING 
STRING CONDITION 




SUPPLEMENTARY INFORMATION EDITING MODULE INTERFACE 



49 



EP0 768612 A2 



FIG.34 




3501 



READ 

STRING-CORRESPONDING 
ELEMENT NAME 



3502 



READ REQUIRED ITEM 
CONTENT OF READ 
STRING-CORRESPONDING 
ELEMENT 



DISPLAY REQUIRED ITEM 
CONTENT 



STRING CONDITION 
INPUT BUTTON 



-3503 



USER ACTION 



3504 

^ EDIT REQUIRED 
ITEM CONTENT 



3505 



ACTIVATE STRING 
CONDITION EDITING 
MODULE 



3506 



GENERATE FORMAT 
CONDITION 



RECEIVE STRING 
CONDITION 



3507 



PASS FORMAT CONDITION AS 
[RETURN ARGUMENT TO KEYWORD] 
INFORMATION INDICATOR 
MODULE 



50 



EP0768612A2 



djart: FORMAT CONDITION = STRING CONDITm> 



FIG.35 



| " ARTICLE NUM1 " \ 
NO 



t -3601 

1 PREVIOUS STRING f 
.NON LINE FEED 

V"^ /3602 

Fyes / 



SURROUND FORMAT 
CONDITION BY [] AND ADD 
[PREVIOUS STRING ] AND 
' + ■ JUST BEFORE IT 



3609 



-ADD'^SPCX" AT 
START OF FORMAT 
CONDITION. 

•x = FIRST-LINE INDENT 



3605 NO 



1 



ADD *$• AT END 
OF FORMAT 
CONDITION 




RETURN PROCESS 



3603 



HSPCO [ARTICLE NUM1 ] \ 
\ SPC1 +" j 



YES 



3604 



IF FORMAT CONDITION DOES 
NOT CONTAIN [ ] 
IT IS SURROUNDED BY [ ) 
AND [NEXT STRING] AND " + " 
ARE ADDED JUST AFTER IT 



3606 



1 



3610 



•"$" AT END OF 
FORMAT CONDITION 
AND ADD "SPCy$" 

•y=RIGHT-HAND SPACE 



********************* f***********r****-*drs*-r*4 

\ * [ ARTICLE NUM1] SPC1 +" \ 



3607 __*ES 



I ARRANGEMENT ] IS 
•CENTERING' t 



ADD "C" AT 
START OF FORMAT 
CONDITION 



RIGHT 
JUSTIFY 



-3608 



3611 



IF OR " + " iS NOT 
PRESENT AT START OF 
FORMAT CONDITION, 
" ! • IS ADDED AT 
START 



EQUAL 
SPACE 




RETURN PROCESS^ 



51 



EP0 768612 A2 




READ AGGREGATION OF 
KEYWORD-CORRESPONDING j 
ELEMENTS 



3701 



READ 

STRING-CORRESPONDING 
ELEMENT INFORMATTION 



3702 



AGGREGATION OF NON-KEYWORD-CORRESPONDING 
ELEMENTS 

= ALL STRING-CORRESPONDING ELEMENTS- 
AGGREGATION OF KEYWORD-CORRESPONDING 
ELEMENTS 



3703 



-3704 



NON-KEYWORD PRESENT 
IN NEXT ELEMENT OF 
NON-KEYWORD-CORRESPONDING 
ELEMENT ? 



3705 



3706 



INFORM USER OF 
CONTIGUOUS NON-KEYWORD 



INFORM USER OF THAT 
THERE IS NON CONTIGUOUS 
NON-KEYWORD 




END 



52 



• EP0 768 612A2 



FIG.37 



AGGREGATION OF KEYWORD-CORRESPONDING ELEMENTS 
= {OPENIG TITLE, PROMULGATION DATA, TITLE, ARTICLE NO., 
PARAGRAPH NO.} 



^3801 



AGGREGATION OF NON-KEYWORD-CORRESPONDING ELEMENTS 
= { ESTABLISHED REGULATION NO., PROMULGATION STATEMENT, 
ARTICLE STATEMENT, PARAGRAPH STATEMENT} 



-3802 



FROM FIG.28 

•PROMULGATION STATEMENT IS PRESENT IN NEXT ELEMENT OF 
ESTABLISHED REGULATION NO. 



-3803 



INFORM USER OF THAT ESTABUSHED REGULATION NO. 
PROMULGATION STATEMENT ARE CONTIGUOUS 



AND 



-3804 



53 



EP 0 768 ! 612 A2 




54 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHTBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: : ' : 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



page blank 



(USPTO) 



